利用基于文本的节点属性的节点分类具有许多真实的应用程序,从学术引用图中的纸张主题到社交媒体网络中用户特征的分类范围。最新的节点分类框架(例如Giant)使用两阶段管道:首先嵌入图节点的文本属性,然后将所得嵌入的嵌入到节点分类模型中。在本文中,我们消除了这两个阶段,而是开发了建立在巨人基于端到端巨型(E2EG)的端到端节点分类模型。在我们的方法中,主体和辅助分类目标的串联利用导致了更强大的模型,从而使BERT主链可以切换为蒸馏编码器,其参数数量减少了25%-40%。此外,模型的端到端性质提高了易用性,因为它避免了链接多个模型进行节点分类的需求。与OGBN-ARXIV和OGBN产品数据集的巨型+MLP基线相比,我们的模型能够在换电环境(+0.5%)中获得稍好的精度,同时将模型培训时间最多减少40%。我们的模型也适用于电感设置,优于巨型 +MLP高达 +2.23%。
translated by 谷歌翻译
There is no settled universal 3D representation for geometry with many alternatives such as point clouds, meshes, implicit functions, and voxels to name a few. In this work, we present a new, compelling alternative for representing shapes using a sequence of cross-sectional closed loops. The loops across all planes form an organizational hierarchy which we leverage for autoregressive shape synthesis and editing. Loops are a non-local description of the underlying shape, as simple loop manipulations (such as shifts) result in significant structural changes to the geometry. This is in contrast to manipulating local primitives such as points in a point cloud or a triangle in a triangle mesh. We further demonstrate that loops are intuitive and natural primitive for analyzing and editing shapes, both computationally and for users.
translated by 谷歌翻译
Adversarial machine learning has been both a major concern and a hot topic recently, especially with the ubiquitous use of deep neural networks in the current landscape. Adversarial attacks and defenses are usually likened to a cat-and-mouse game in which defenders and attackers evolve over the time. On one hand, the goal is to develop strong and robust deep networks that are resistant to malicious actors. On the other hand, in order to achieve that, we need to devise even stronger adversarial attacks to challenge these defense models. Most of existing attacks employs a single $\ell_p$ distance (commonly, $p\in\{1,2,\infty\}$) to define the concept of closeness and performs steepest gradient ascent w.r.t. this $p$-norm to update all pixels in an adversarial example in the same way. These $\ell_p$ attacks each has its own pros and cons; and there is no single attack that can successfully break through defense models that are robust against multiple $\ell_p$ norms simultaneously. Motivated by these observations, we come up with a natural approach: combining various $\ell_p$ gradient projections on a pixel level to achieve a joint adversarial perturbation. Specifically, we learn how to perturb each pixel to maximize the attack performance, while maintaining the overall visual imperceptibility of adversarial examples. Finally, through various experiments with standardized benchmarks, we show that our method outperforms most current strong attacks across state-of-the-art defense mechanisms, while retaining its ability to remain clean visually.
translated by 谷歌翻译
The introduction of high-quality image generation models, particularly the StyleGAN family, provides a powerful tool to synthesize and manipulate images. However, existing models are built upon high-quality (HQ) data as desired outputs, making them unfit for in-the-wild low-quality (LQ) images, which are common inputs for manipulation. In this work, we bridge this gap by proposing a novel GAN structure that allows for generating images with controllable quality. The network can synthesize various image degradation and restore the sharp image via a quality control code. Our proposed QC-StyleGAN can directly edit LQ images without altering their quality by applying GAN inversion and manipulation techniques. It also provides for free an image restoration solution that can handle various degradations, including noise, blur, compression artifacts, and their mixtures. Finally, we demonstrate numerous other applications such as image degradation synthesis, transfer, and interpolation.
translated by 谷歌翻译
Robots have been brought to work close to humans in many scenarios. For coexistence and collaboration, robots should be safe and pleasant for humans to interact with. To this end, the robots could be both physically soft with multimodal sensing/perception, so that the robots could have better awareness of the surrounding environment, as well as to respond properly to humans' action/intention. This paper introduces a novel soft robotic link, named ProTac, that possesses multiple sensing modes: tactile and proximity sensing, based on computer vision and a functional material. These modalities come from a layered structure of a soft transparent silicon skin, a polymer dispersed liquid crystal (PDLC) film, and reflective markers. Here, the PDLC film can switch actively between the opaque and the transparent state, from which the tactile sensing and proximity sensing can be obtained by using cameras solely built inside the ProTac link. In this paper, inference algorithms for tactile proximity perception are introduced. Evaluation results of two sensing modalities demonstrated that, with a simple activation strategy, ProTac link could effectively perceive useful information from both approaching and in-contact obstacles. The proposed sensing device is expected to bring in ultimate solutions for design of robots with softness, whole-body and multimodal sensing, and safety control strategies.
translated by 谷歌翻译
COVID-19大流行已经暴露了全球医疗服务的脆弱性,增加了开发新颖的工具来提供快速且具有成本效益的筛查和诊断的需求。临床报告表明,Covid-19感染可能导致心脏损伤,心电图(ECG)可以作为Covid-19的诊断生物标志物。这项研究旨在利用ECG信号自动检测COVID-19。我们提出了一种从ECG纸记录中提取ECG信号的新方法,然后将其送入一维卷积神经网络(1D-CNN)中,以学习和诊断疾病。为了评估数字信号的质量,标记了基于纸张的ECG图像中的R峰。之后,将从每个图像计算的RR间隔与相应数字化信号的RR间隔进行比较。 COVID-19 ECG图像数据集上的实验表明,提出的数字化方法能够正确捕获原始信号,平均绝对误差为28.11 ms。我们提出的1D-CNN模型在数字化的心电图信号上进行了训练,允许准确识别患有COVID-19和其他受试者的个体,分类精度为98.42%,95.63%和98.50%,用于分类COVID-19 vs.正常,与正常人分类, COVID-19与异常心跳和Covid-19和其他类别分别与其他阶级。此外,提出的方法还为多分类任务实现了高级的性能。我们的发现表明,经过数字化的心电图信号训练的深度学习系统可以作为诊断Covid-19的潜在工具。
translated by 谷歌翻译
客户的评论在在线购物中起着至关重要的作用。人们经常参考以前客户的评论或评论,以决定是否购买新产品。赶上这种行为,有些人会为骗子的客户创建不真实的评论,以了解产品的假质量。这些评论称为垃圾邮件评论,它使消费者在在线购物平台上混淆,并对在线购物行为产生负面影响。我们提出了称为Vispamreviews的数据集,该数据集具有严格的注释程序,用于检测电子商务平台上的垃圾邮件评论。我们的数据集由两个任务组成:用于检测评论是否为垃圾邮件的二进制分类任务以及用于识别垃圾邮件类型的多类分类任务。Phobert在这两个任务上均以宏平均F1分别获得了最高的结果,分别为88.93%和72.17%。
translated by 谷歌翻译
尽管在自动语音识别(ASR)中最近的表现方法增加了,但这种方法并不能确保其输出的适当套管和标点符号。这个问题对自然语言处理(NLP)算法和人类的理解都有重大影响。对于原始文本输入的预处理管道,必须进行资本化和标点符号恢复。对于越南人等低资源语言,此任务的公共数据集很少。在本文中,我们为越南人的资本化和标点符号恢复贡献了一个公共数据集;并提出了两个名为intercappunc的任务的联合模型。越南数据集的实验结果显示了我们联合模型的有效性与单个模型和先前的联合学习模型相比。我们在https://github.com/anhtunguyen98/jointcappund上公开发布数据集和模型的实现
translated by 谷歌翻译
基于硬件的加速度是促进许多计算密集型数学操作的广泛尝试。本文提出了一个基于FPGA的体系结构来加速卷积操作 - 在许多卷积神经网络模型中出现的复杂且昂贵的计算步骤。我们将设计定为标准卷积操作,打算以边缘-AI解决方案启动产品。该项目的目的是产生一个可以一次处理卷积层的FPGA IP核心。系统开发人员可以使用Verilog HDL作为体系结构的主要设计语言来部署IP核心。实验结果表明,我们在简单的边缘计算FPGA板上合成的单个计算核心可以提供0.224 GOPS。当董事会充分利用时,可以实现4.48 GOP。
translated by 谷歌翻译
口头语言建模的最新工作表明,可以从原始音频中学习语言的可能性,而无需任何文本标签。该方法首先依赖于将音频转换为一系列离散单元(或伪文本),然后直接在此类伪文本上训练语言模型。这是必要的离散瓶颈,在语音信号的编码中可能引入不可逆转的错误,还是我们可以完全没有离散单位学习语言模型?在这项工作中,我们研究了离散和连续表示在口语建模中的作用。我们表明,离散化对于口语建模的良好结果确实至关重要。我们表明,离散化可以从连续功能中消除语言上无关的信息,从而有助于提高语言建模表演。在这项研究的基础上,我们培训了Hubert功能离散单元的语言模型,达到新的最先进的结果,导致了零资源语音挑战的词汇,句法和语义指标2021(轨道1-仅讲话)。
translated by 谷歌翻译